A system for retrieving broadcast news speech documents using voice input keywords and similarity between words

نویسندگان

  • Hiromitsu Nishizaki
  • Seiichi Nakagawa
چکیده

This paper describes a robust speech documents retrieval system that uses voice input keywords. To solve the inevitable problems which arise when the input to the system is speech, i.e. misrecognition, a novel method was developed, where, before the retrieval processing, unproductive keyword candidates are discarded by a grouping processing using the similarity between words and the recognition score of keywords. In retrieval experiments, we used the proposed method to retrieve Japanese broadcast news documents through voice keywords input to the system and showed its e ectiveness.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...

متن کامل

Title generation for spoken broadcast news using a training corpus

The problem of title generation involves finding the essence of a document and expressing it in only a few words. The results of a query to the Informedia Digital Video Library are summarized through an automatically generated title for each retrieved news story. When the document is errorful, as with speech-recognized broadcast news stories, the title creation challenge becomes even greater. W...

متن کامل

Proper name retrieval from diachronic documents for automatic speech transcription using lexical and temporal context

Proper names are usually key to understanding the information contained in a document. Our work focuses on increasing the vocabulary coverage of a speech transcription system by automatically retrieving new proper names from contemporary diachronic text documents. The idea is to use in-vocabulary proper names as an anchor to collect new linked proper names from the diachronic corpus. Our assump...

متن کامل

Automatic Title Generation for Spoken Broadcast News

In this paper, we implemented a set of title generation methods using training set of 21190 news stories and evaluated them on an independent test corpus of 1006 broadcast news documents, comparing the results over manual transcription to the results over automatically recognized speech. We use both F1 and the average number of correct title words in the correct order as metric. Overall, the re...

متن کامل

مقایسه روش‌های مختلف یادگیری ماشین در خلاصه‌سازی استخراجی گفتار به گفتار فارسی بدون استفاده از رونوشت

In this paper, extractive speech summarization using different machine learning algorithms was investigated. The task of Speech summarization deals with extracting important and salient segments from speech in order to access, search, extract and browse speech files easier and in a less costly manner. In this paper, a new method for speech summarization without using automatic speech recognitio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000